home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.lang.c
- Path: cwi.nl!dik
- From: dik@cwi.nl (Dik T. Winter)
- Subject: Re: Matrix Multiplication
- Message-ID: <DLyvL5.9uH@cwi.nl>
- Sender: news@cwi.nl (The Daily Dross)
- Nntp-Posting-Host: chrysant.cwi.nl
- Organization: CWI, Amsterdam
- References: <822792812snz@genesis.demon.co.uk> <DLwzv9.9wu@cwi.nl> <822923024snz@genesis.demon.co.uk>
- Date: Mon, 29 Jan 1996 23:51:05 GMT
-
- In article <822923024snz@genesis.demon.co.uk> fred@genesis.demon.co.uk writes:
- > Right. Since in my version c is only ever written to cache line reads
- > are not necessarily an issue.
- { k innermost in c[i][j] += a[i][k] * b[k][j] .}
- > are not necessarily an issue. It is also outside the innermost loop so
- > (in your case where N=1000) a cache miss compared to a 1000 iteration
- > loop (with 1000 potential cache misses for b) should be insignificant.
- > Indeed the chance of being able to accumulate the result in the inner loop
- > in a register is likely to be more significant.
- See below.
- >
- > >Timings on an SGI with a 100 MHz MIPS R4k (in seconds with N = 1000, 10
- > >multiplications):
- This was an error in my posting while the array sizes where 1000x1000
- the computations were performed on 100x100 sections (and this makes
- a difference).
- > >i innermost 11.08
- > >k innermost 5.92
- > >k innermost + scalar 4.60
- > >j innermost 2.59
- >
- [ try using temp for a[i][k] when j is the innermost loop.]
- > since that is an optimisation the compiler definitely can't make for itself
- > if it can't tell that a and c are distinct.
-
- I just tried it (and also a temp for b[k][j] when i is innermost and it
- makes absolutely no noticeable difference (as expected). While the
- compiler is not allowed to keep a[i][k] in a register, the value will
- be kept in cache so each load of a[i][k] is just a cache read (unless
- a and c overlap of course).
- >
- > Actually this makes it clear why your version works fast - *everything*
- > in the inner loop caches well. The only disadvantage is that the c update
- > requires an 'external' read-modify-write sequence. The performance of that
- > will depend on how write caching is implemented.
-
- There is no need for an 'external' read-modify-write sequence! When an
- element of c is updated it just writes out to the cache. When an element
- of a or b loads that for some reason overlaps with the c element the
- effect is just a cache hit. The cache makes aliasing transparent. It
- depends on the workings of the cache whether a write to the cache also
- results in a direct write to memory or whether that can be delayed until
- there is no bus activity (or required because of replacement strategy).
-
- It is important to keep in mind that with increasing processor speeds
- a cache miss may result in a load taking several cycles (I know of one
- processor where a secondary cache miss results in a load of 157 cycles,
- compared to a 2 cycle load on a primary cache hit).
-
- Using the j index innermost most likely gives best performance as it
- does best with trying to optimize the number of cache hits, and timing
- suggest that it is the most insensitive to variations of the spacing
- of elements. While in some cases the "k innermost + scalar" performs
- spectaculary well (and I have yet to analyze why) there are many cases
- that it performs much worse.
-
- > Maybe C could do as well as FORTRAN in this case.
-
- Oh, it can (and in many other cases as well). But it depends a bit on
- processor architecture. Caches help a lot to alleviate the aliasing
- problem, but caches ought to be large enough and on most processors
- they are just way too small. Also the cache replacement strategy
- implemented, the cache-line size and the set-associativity of the
- cache make great differences. But Fortran suffers as well here. And
- were it will not fly is on vector-processors, on those there is nothing
- to help with possible aliasing. (On the other hand, I just had a try
- on a Cray C-90 and I do not understand the timings there... I have
- to analyze a bit more I think.)
- --
- dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924098
- home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/
-